منابع مشابه
Text Indexing with Errors
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the numbe...
متن کاملCompressed Text Indexing with Wildcards
Let T = T1φ 1T2φ k2 · · ·φdTd+1 be a text of total length n, where characters of each Ti are chosen from an alphabet Σ of size σ, and φ denotes a wildcard symbol. The text indexing with wildcards problem is to index T such that when we are given a query pattern P , we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain singl...
متن کاملText Clustering with Random Indexing
This project explored how the language technology method Random Indexing can be used for clustering of texts from Swedish newspapers. The resulting Random Indexing based representation yields similar results as an ordinary representation when the number of clusters matches the real categories. With an increased number of clusters the Random Index based representation yields better results than ...
متن کاملSuccincter Text Indexing with Wildcards
We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)—positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giv...
متن کاملIndexing Text with Approximate q-Grams
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the inde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Discrete Algorithms
سال: 2007
ISSN: 1570-8667
DOI: 10.1016/j.jda.2006.11.001